Multi-Armed Bandit Problem and Its Applications in Intelligent Tutoring Systems
نویسندگان
چکیده
Master of Complex System Multi-Armed Bandit Problem and Its Applications in Intelligent Tutoring Systems. by Minh-Quan Nguyen In this project, we propose solutions to exploration vs exploitation problems in Intelligent Tutoring Systems (ITS) using multi-armed bandit (MAB) algorithms. ITSs on one side want to select the best learning objects available to recommends to learners in the systems but they simultaneously want to recommend learners to try new objects so that it can learn the characteristics of new objects for better recommendation in the future. This is the exploration vs exploitation problem in ITSs. We model these problems as MAB problems. We consider the optimal strategy: the Gittins Index strategy and two other MAB strategies: Upper Confidence Bound (UCB) and Thompson Sampling. We apply these strategies in two problems: recommender courses to learners and exercises scheduling. We evaluate these strategies using simulation.
منابع مشابه
Finite dimensional algorithms for the hidden Markov model multi-armed bandit problem
The multi-arm bandit problem is widely used in scheduling of traffic in broadband networks, manufacturing systems and robotics. This paper presents a finite dimensional optimal solution to the multi-arm bandit problem for Hidden Markov Models. The key to solving any multi-arm bandit problem is to compute the Gittins index. In this paper a finite dimensional algorithm is presented which exactly ...
متن کاملBandit Problems
We survey the literature on multi-armed bandit models and their applications in economics. The multi-armed bandit problem is a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. This classic problem has received much attention in economics as it concisely models the trade-off between exploration (trying out each arm to find ...
متن کاملMulti-armed bandit problem with known trend
We consider a variant of the multi-armed bandit model, which we call multi-armed bandit problem with known trend, where the gambler knows the shape of the reward function of each arm but not its distribution. This new problem is motivated by different on-line problems like active learning, music and interface recommendation applications, where when an arm is sampled by the model the received re...
متن کاملThe Irrevocable Multiarmed Bandit Problem
This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls and the additional restriction that we do not allow recourse to arms that were pulled at some point in the past but then discarded. This additional restriction is highly desirable from an operational perspective and we refer to this problem as the ‘Irrevocable Multi-Armed Bandit’ problem. We observe that na...
متن کاملThe multi-armed bandit problem with covariates
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewa...
متن کامل